Pesquisa | Portal Regional da BVS

1.

Confirming the statistically significant superiority of tree-based machine learning algorithms over their counterparts for tabular data.

Uddin, Shahadat; Lu, Haohui.

PLoS One ; 19(4): e0301541, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-38635591

RESUMO

Many individual studies in the literature observed the superiority of tree-based machine learning (ML) algorithms. However, the current body of literature lacks statistical validation of this superiority. This study addresses this gap by employing five ML algorithms on 200 open-access datasets from a wide range of research contexts to statistically confirm the superiority of tree-based ML algorithms over their counterparts. Specifically, it examines two tree-based ML (Decision tree and Random forest) and three non-tree-based ML (Support vector machine, Logistic regression and k-nearest neighbour) algorithms. Results from paired-sample t-tests show that both tree-based ML algorithms reveal better performance than each non-tree-based ML algorithm for the four ML performance measures (accuracy, precision, recall and F1 score) considered in this study, each at p<0.001 significance level. This performance superiority is consistent across both the model development and test phases. This study also used paired-sample t-tests for the subsets of the research datasets from disease prediction (66) and university-ranking (50) research contexts for further validation. The observed superiority of the tree-based ML algorithms remains valid for these subsets. Tree-based ML algorithms significantly outperformed non-tree-based algorithms for these two research contexts for all four performance measures. We discuss the research implications of these findings in detail in this article.

Assuntos

Algoritmos , Aprendizado de Máquina , Humanos , Máquina de Vetores de Suporte , Modelos Logísticos

2.

An NLP-based novel approach for assessing national influence in clause dissemination across bilateral investment treaties.

Uddin, Shahadat; Lu, Haohui; Alschner, Wolfgang; Patay, Dori; Frank, Nicholas; Gomes, Fabio S; Thow, Anne Marie.

PLoS One ; 19(3): e0298380, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-38470902

RESUMO

International investment agreements (IIAs) promote foreign investment. However, they can undermine crucial health programs, creating a dilemma for governments between corporate and public health interests. For this reason, including clauses that safeguard health has become an essential practice in IIAs. According to the current literature, some countries have played a pivotal role in leading this inclusion, while others follow the former ones. However, the existing literature needs a unique approach that can quantify the influence strength of a country in disseminating clauses that explicitly mention health provisions to others. Following an NLP (Natural Language Processing)-based text similarity analysis of Bilateral Investment Treaties (BITs), this study proposes a metric, 'Influence' (INF), which provides insights into the role of different countries or regions in the propagation of IIA texts among BITs. We demonstrate a comprehensive application of this metric using a large agreement dataset. Our findings from this application corroborate the evidence in the current literature, supporting the validity of the proposed metric. According to the INF, Germany, Canada, and Brazil emerged as the most influential players in defensive, neutral, and offensive health mentions, respectively. These countries wield substantial bargaining power in international investment law and policy, and their innovative approaches to BITs set a path for others to follow. These countries provide crucial insights into the direction and sources of influence of international investment regulations to safeguard health. The proposed metric holds substantial usage for policymakers and investors. This can help them identify vital global countries in IIA text dissemination and create new policy guidelines to safeguard health while balancing economic development and public health protection. A software tool based on the proposed INF measure can be found at https://inftool.com/.

Assuntos

Comércio , Cooperação Internacional , Internacionalidade , Saúde Pública , Investimentos em Saúde

3.

Dataset meta-level and statistical features affect machine learning performance.

Uddin, Shahadat; Lu, Haohui.

Sci Rep ; 14(1): 1670, 2024 Jan 19.

Artigo em Inglês | MEDLINE | ID: mdl-38238551

RESUMO

What dataset features affect machine learning (ML) performance has primarily been unknown in the current literature. This study examines the impact of tabular datasets' different meta-level and statistical features on the performance of various ML algorithms. The three meta-level features this study considered are the dataset size, the number of attributes and the ratio between the positive (class 1) and negative (class 0) class instances. It considered four statistical features for each dataset: mean, standard deviation, skewness and kurtosis. After applying the required scaling, this study averaged (uniform and weighted) each dataset's different attributes to quantify its four statistical features. We analysed 200 open-access tabular datasets from the Kaggle (147) and UCI Machine Learning Repository (53) and developed ML classification models (through classification implementation and hyperparameter tuning) for each dataset. Then, this study developed multiple regression models to explore the impact of dataset features on ML performance. We found that kurtosis has a statistically significant negative effect on the accuracy of the three non-tree-based ML algorithms of the Support vector machine (SVM), Logistic regression (LR) and K-nearest neighbour (KNN) for their classical implementation with both uniform and weighted aggregations. This study observed similar findings in most cases for ML implementations through hyperparameter tuning, except for SVM with weighted aggregation. Meta-level and statistical features barely show any statistically significant impact on the accuracy of the two tree-based ML algorithms (Decision tree and Random forest), except for implementation through hyperparameter tuning for the weighted aggregation. When we excluded some datasets based on the imbalanced statistics and a significantly higher contribution of one attribute compared to others to the classification performance, we found a significant effect of the meta-level ratio feature and statistical mean and standard deviation features on SVM, LR and KNN accuracy in many cases. Our findings open a new research direction in understanding how dataset characteristics affect ML performance and will help researchers select appropriate ML algorithms for a possible optimal accuracy outcome.

4.

Road networks and socio-demographic factors to explore COVID-19 infection during its different waves.

Uddin, Shahadat; Khan, Arif; Lu, Haohui; Zhou, Fangyu; Karim, Shakir; Hajati, Farshid; Moni, Mohammad Ali.

Sci Rep ; 14(1): 1551, 2024 01 18.

Artigo em Inglês | MEDLINE | ID: mdl-38233430

RESUMO

The COVID-19 pandemic triggered an unprecedented level of restrictive measures globally. Most countries resorted to lockdowns at some point to buy the much-needed time for flattening the curve and scaling up vaccination and treatment capacity. Although lockdowns, social distancing and business closures generally slowed the case growth, there is a growing concern about these restrictions' social, economic and psychological impact, especially on the disadvantaged and poorer segments of society. While we are all in this together, these segments often take the heavier toll of the pandemic and face harsher restrictions or get blamed for community transmission. This study proposes a road-network-based networked approach to model mobility patterns between localities during lockdown stages. It utilises a panel regression method to analyse the effects of mobility in transmitting COVID-19 in an Australian context, together with a close look at a suburban population's characteristics like their age, income and education. Firstly, we attempt to model how the local road networks between the neighbouring suburbs (i.e., neighbourhood measure) and current infection count affect the case growth and how they differ between delta and omicron variants. We use a geographic information system, population and infection data to measure road connections, mobility and transmission probability across the suburbs. We then looked at three socio-demographic variables: age, education and income and explored how they moderate independent and dependent variables (infection rates and neighbourhood measures). The result shows strong model performance to predict infection rate based on neighbourhood road connection. However, apart from age in the delta variant context, the other variables (income and education level) do not seem to moderate the relationship between infection rate and neighbourhood measure. The results indicate that suburbs with a more socio-economically disadvantaged population do not necessarily contribute to more community transmission. The study findings could be potentially helpful for stakeholders in tailoring any health decision for future pandemics.

Assuntos

COVID-19 , Humanos , Austrália/epidemiologia , COVID-19/epidemiologia , Controle de Doenças Transmissíveis , Pandemias , SARS-CoV-2 , Demografia

5.

Disease Prediction Using Graph Machine Learning Based on Electronic Health Data: A Review of Approaches and Trends.

Lu, Haohui; Uddin, Shahadat.

Healthcare (Basel) ; 11(7)2023 Apr 04.

Artigo em Inglês | MEDLINE | ID: mdl-37046958

RESUMO

Graph machine-learning (ML) methods have recently attracted great attention and have made significant progress in graph applications. To date, most graph ML approaches have been evaluated on social networks, but they have not been comprehensively reviewed in the health informatics domain. Herein, a review of graph ML methods and their applications in the disease prediction domain based on electronic health data is presented in this study from two levels: node classification and link prediction. Commonly used graph ML approaches for these two levels are shallow embedding and graph neural networks (GNN). This study performs comprehensive research to identify articles that applied or proposed graph ML models on disease prediction using electronic health data. We considered journals and conferences from four digital library databases (i.e., PubMed, Scopus, ACM digital library, and IEEEXplore). Based on the identified articles, we review the present status of and trends in graph ML approaches for disease prediction using electronic health data. Even though GNN-based models have achieved outstanding results compared with the traditional ML methods in a wide range of disease prediction tasks, they still confront interpretability and dynamic graph challenges. Though the disease prediction field using ML techniques is still emerging, GNN-based models have the potential to be an excellent approach for disease prediction, which can be used in medical diagnosis, treatment, and the prognosis of diseases.

6.

Embedding-based link predictions to explore latent comorbidity of chronic diseases.

Lu, Haohui; Uddin, Shahadat.

Health Inf Sci Syst ; 11(1): 2, 2023 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-36593862

RESUMO

Purpose: Comorbidity is a term used to describe when a patient simultaneously has more than one chronic disease. Comorbidity is a significant health issue that affects people worldwide. This study aims to use machine learning and graph theory to predict the comorbidity of chronic diseases. Methods: A patient-disease bipartite graph is constructed based on the administrative claim data. The bipartite graph projection approach was used to create the comorbidity network. For the link prediction task, three graph machine learning embedding-based models (node2vec, graph neural networks and hand-crafted approach) with different variants were used on the comorbidity network to compare their performance. This study also considered three commonly used similarity-based link prediction approaches (Jaccard coefficient, Adamic-Adar index and Resource allocation index) for performance comparison. Results: The results showed that the embedding-based hand-crafted features technique achieved outstanding performance compared with the remaining similarity-based and embedding-based models. Especially, the hand-crafted technique with the extreme gradient boosting classifier achieved the highest accuracy (91.67%), followed by the same technique with the Logistic regression classifier (90.26%). For this shallow embedding method, the Jaccard coefficient and the degree centrality of the original chronic disease were the most important features for comorbidity prediction. Conclusion: The proposed framework can be used to predict the comorbidity of chronic disease at an early stage of hospital admission. Thus, the prediction outcome could be valuable for medical practice, giving healthcare providers more control over their services and lowering expenses.

7.

Comorbidity progression patterns of major chronic diseases: The impact of age, gender and time-window.

Uddin, Shahadat; Wang, Shangzhou; Khan, Arif; Lu, Haohui.

Chronic Illn ; 19(2): 304-313, 2023 06.

Artigo em Inglês | MEDLINE | ID: mdl-35306857

RESUMO

OBJECTIVE: The presence of one chronic disease often leads to the development of one or more other chronic diseases. This study examines whether there are significant progressions between chronic diseases and identifies the risk factors that influence them. METHODS: This study used an administrative healthcare dataset sample from 29,280 hospitalized patients over 24 years (1995 to 2018, inclusive) to explore the progression of common chronic diseases and their major comorbidities. An Australian health insurance organization provided the dataset. We used the t-test to examine the statistical significance of progression between chronic diseases. A network analysis approach is followed to rank different chronic diseases contributing to disease progressions. RESULTS: We found that few chronic diseases (e.g. cardiovascular diseases and diabetes) have a high prevalence in progressing to other chronic diseases, which is statistically significant at p ≤ 0.05. These progression frequencies significantly increase with time and age. We also found that patients' sex differently affects the disease progressions. DISCUSSION: This study found that some chronic diseases have a high prevalence in comorbidity progressions. In addition, the progression statistics differ with time and age. The results of this study can help researchers, stakeholders, and policymakers gain insights into the disease transitions and act as a guiding tool to assess future disease burden and plan accordingly.

Assuntos

Doenças Cardiovasculares , Diabetes Mellitus , Humanos , Austrália/epidemiologia , Comorbidade , Doença Crônica , Diabetes Mellitus/epidemiologia , Doenças Cardiovasculares/epidemiologia

8.

Machine learning in project analytics: a data-driven framework and case study.

Uddin, Shahadat; Ong, Stephen; Lu, Haohui.

Sci Rep ; 12(1): 15252, 2022 09 09.

Artigo em Inglês | MEDLINE | ID: mdl-36085353

RESUMO

The analytic procedures incorporated to facilitate the delivery of projects are often referred to as project analytics. Existing techniques focus on retrospective reporting and understanding the underlying relationships to make informed decisions. Although machine learning algorithms have been widely used in addressing problems within various contexts (e.g., streamlining the design of construction projects), limited studies have evaluated pre-existing machine learning methods within the delivery of construction projects. Due to this, the current research aims to contribute further to this convergence between artificial intelligence and the execution construction project through the evaluation of a specific set of machine learning algorithms. This study proposes a machine learning-based data-driven research framework for addressing problems related to project analytics. It then illustrates an example of the application of this framework. In this illustration, existing data from an open-source data repository on construction projects and cost overrun frequencies was studied in which several machine learning models (Python's Scikit-learn package) were tested and evaluated. The data consisted of 44 independent variables (from materials to labour and contracting) and one dependent variable (project cost overrun frequency), which has been categorised for processing under several machine learning models. These models include support vector machine, logistic regression, k-nearest neighbour, random forest, stacking (ensemble) model and artificial neural network. Feature selection and evaluation methods, including the Univariate feature selection, Recursive feature elimination, SelectFromModel and confusion matrix, were applied to determine the most accurate prediction model. This study also discusses the generalisability of using the proposed research framework in other research contexts within the field of project management. The proposed framework, its illustration in the context of construction projects and its potential to be adopted in different contexts will significantly contribute to project practitioners, stakeholders and academics in addressing many project-related issues.

Assuntos

Inteligência Artificial , Aprendizado de Máquina , Modelos Logísticos , Estudos Retrospectivos , Máquina de Vetores de Suporte

9.

Comparing the Impact of Road Networks on COVID-19 Severity between Delta and Omicron Variants: A Study Based on Greater Sydney (Australia) Suburbs.

Uddin, Shahadat; Lu, Haohui; Khan, Arif; Karim, Shakir; Zhou, Fangyu.

Int J Environ Res Public Health ; 19(11)2022 05 27.

Artigo em Inglês | MEDLINE | ID: mdl-35682134

RESUMO

The Omicron and Delta variants of COVID-19 have recently become the most dominant virus strains worldwide. A recent study on the Delta variant found that a suburban road network provides a reliable proxy for human mobility to explore COVID-19 severity. This study first examines the impact of road networks on COVID-19 severity for the Omicron variant using the infection and road connections data from Greater Sydney, Australia. We then compare the findings of this study with a recent study that used the infection data of the Delta variant for the same region. In analysing the road network, we used four centrality measures (degree, closeness, betweenness and eigenvector) and the coreness measure. We developed two multiple linear regression models for Delta and Omicron variants using the same set of independent and dependent variables. Only eigenvector is a statistically significant predictor for COVID-19 severity for the Omicron variant. On the other hand, both degree and eigenvector are statistically significant predictors for the Delta variant, as found in a recent study considered for comparison. We further found a statistical difference (p < 0.05) between the R-squared values for these two multiple linear regression models. Our findings point to an important difference in the transmission nature of Delta and Omicron variants, which could provide practical insights into understanding their infectious nature and developing appropriate control strategies accordingly.

Assuntos

COVID-19 , Austrália/epidemiologia , COVID-19/epidemiologia , Humanos , SARS-CoV-2/genética

10.

Comparative performance analysis of K-nearest neighbour (KNN) algorithm and its different variants for disease prediction.

Uddin, Shahadat; Haque, Ibtisham; Lu, Haohui; Moni, Mohammad Ali; Gide, Ergun.

Sci Rep ; 12(1): 6256, 2022 04 15.

Artigo em Inglês | MEDLINE | ID: mdl-35428863

RESUMO

Disease risk prediction is a rising challenge in the medical domain. Researchers have widely used machine learning algorithms to solve this challenge. The k-nearest neighbour (KNN) algorithm is the most frequently used among the wide range of machine learning algorithms. This paper presents a study on different KNN variants (Classic one, Adaptive, Locally adaptive, k-means clustering, Fuzzy, Mutual, Ensemble, Hassanat and Generalised mean distance) and their performance comparison for disease prediction. This study analysed these variants in-depth through implementations and experimentations using eight machine learning benchmark datasets obtained from Kaggle, UCI Machine learning repository and OpenML. The datasets were related to different disease contexts. We considered the performance measures of accuracy, precision and recall for comparative analysis. The average accuracy values of these variants ranged from 64.22% to 83.62%. The Hassanaat KNN showed the highest average accuracy (83.62%), followed by the ensemble approach KNN (82.34%). A relative performance index is also proposed based on each performance measure to assess each variant and compare the results. This study identified Hassanat KNN as the best performing variant based on the accuracy-based version of this index, followed by the ensemble approach KNN. This study also provided a relative comparison among KNN variants based on precision and recall measures. Finally, this paper summarises which KNN variant is the most promising candidate to follow under the consideration of three performance measures (accuracy, precision and recall) for disease prediction. Healthcare researchers and stakeholders could use the findings of this study to select the appropriate KNN variant for predictive disease risk analytics.

Assuntos

Algoritmos , Aprendizado de Máquina , Análise por Conglomerados

11.

Suburban Road Networks to Explore COVID-19 Vulnerability and Severity.

Uddin, Shahadat; Khan, Arif; Lu, Haohui; Zhou, Fangyu; Karim, Shakir.

Int J Environ Res Public Health ; 19(4)2022 02 11.

Artigo em Inglês | MEDLINE | ID: mdl-35206227

RESUMO

The Delta variant of COVID-19 has been found to be extremely difficult to contain worldwide. The complex dynamics of human mobility and the variable intensity of local outbreaks make measuring the factors of COVID-19 transmission a challenge. The inter-suburb road connection details provide a reliable proxy of the moving options for people between suburbs for a given region. By using such data from Greater Sydney, Australia, this study explored the impact of suburban road networks on two COVID-19-related outcomes measures. The first measure is COVID-19 vulnerability, which gives a low score to a more vulnerable suburb. A suburb is more vulnerable if it has the first COVID-19 case earlier and vice versa. The second measure is COVID-19 severity, which is proportionate to the number of COVID-19-positive cases for a suburb. To analyze the suburban road network, we considered four centrality measures (degree, closeness, betweenness and eigenvector) and core-periphery structure. We found that the degree centrality measure of the suburban road network was a strong and statistically significant predictor for both COVID-19 vulnerability and severity. Closeness centrality and eigenvector centrality were also statistically significant predictors for COVID-19 vulnerability and severity, respectively. The findings of this study could provide practical insights to stakeholders and policymakers to develop timely strategies and policies to prevent and contain any highly infectious pandemics, including the Delta variant of COVID-19.

Assuntos

COVID-19 , Austrália , COVID-19/epidemiologia , Humanos , Pandemias , SARS-CoV-2

12.

A weighted patient network-based framework for predicting chronic diseases using graph neural networks.

Lu, Haohui; Uddin, Shahadat.

Sci Rep ; 11(1): 22607, 2021 11 19.

Artigo em Inglês | MEDLINE | ID: mdl-34799627

RESUMO

Chronic disease prediction is a critical task in healthcare. Existing studies fulfil this requirement by employing machine learning techniques based on patient features, but they suffer from high dimensional data problems and a high level of bias. We propose a framework for predicting chronic disease based on Graph Neural Networks (GNNs) to address these issues. We begin by projecting a patient-disease bipartite graph to create a weighted patient network (WPN) that extracts the latent relationship among patients. We then use GNN-based techniques to build prediction models. These models use features extracted from WPN to create robust patient representations for chronic disease prediction. We compare the output of GNN-based models to machine learning methods by using cardiovascular disease and chronic pulmonary disease. The results show that our framework enhances the accuracy of chronic disease prediction. The model with attention mechanisms achieves an accuracy of 93.49% for cardiovascular disease prediction and 89.15% for chronic pulmonary disease prediction. Furthermore, the visualisation of the last hidden layers of GNN-based models shows the pattern for the two cohorts, demonstrating the discriminative strength of the framework. The proposed framework can help stakeholders improve health management systems for patients at risk of developing chronic diseases and conditions.

Assuntos

Doenças Cardiovasculares/diagnóstico , Doença Crônica , Redes Neurais de Computação , Doença Pulmonar Obstrutiva Crônica/diagnóstico , Software , Algoritmos , Interpretação Estatística de Dados , Bases de Dados Factuais , Feminino , Humanos , Aprendizado de Máquina , Masculino , Linguagens de Programação , Reprodutibilidade dos Testes , Risco , Pesquisa Translacional Biomédica

13.

Melt extrusion deposition (MED™) 3D printing technology - A paradigm shift in design and development of modified release drug products.

Zheng, Yu; Deng, Feihuang; Wang, Bo; Wu, Yue; Luo, Qing; Zuo, Xianghao; Liu, Xin; Cao, Lihua; Li, Min; Lu, Haohui; Cheng, Senping; Li, Xiaoling.

Int J Pharm ; 602: 120639, 2021 Jun 01.

Artigo em Inglês | MEDLINE | ID: mdl-33901601

RESUMO

Three-dimensional printing (3DP) technology offers unique advantages for pharmaceutical applications. However, most of current 3D printing methods and instrumentations are not specifically designed and developed for pharmaceutical applications. To meet the needs in pharmaceutical applications for precision, compatibility with a wide range of pharmaceutical excipients and drug materials without additional processing, high throughput and GMP compliance, an extrusion-based 3D printer based on Melt Extrusion Deposition (MED™) 3D printing technology was developed in this study. This technology can process powder pharmaceutical excipients and drugs directly without the need of preparing filament as required by FDM 3D printing. Six different tablet designs based on compartment models were used to demonstrate the precision and reproducibility of this technology. The designed tablets were fabricated using the GMP-compliant MED™ 3D printer and were evaluated in vitro for drug release and in vivo for selected designs using male beagle dogs. Tablet designs with one or more compartments showed versatile release characteristics in modulating the release onset time, release kinetics, duration of release and mode of release. Multiple drugs or formulations were fabricated into a single tablet to achieve independent release kinetics for each drug or to fine-tune the pharmacokinetic profile of a drug. Building upon the theoretical analysis of models, precision and reproducibility of MED™ 3D printing technology, a novel product development approach, 3D printing formulation by design (3DPFbD®) was developed to provide an efficient tool for fast and efficient pharmaceutical product development. The MED™ 3D printing represents a novel and promising technology platform encompassing design and development of modified drug release products and has potential to impact the drug delivery and pharmaceutical product development.

Assuntos

Excipientes , Impressão Tridimensional , Animais , Cães , Liberação Controlada de Fármacos , Masculino , Reprodutibilidade dos Testes , Comprimidos , Tecnologia Farmacêutica

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA